parallel primer

Simple parallelizing

Here’s an example of a piece of code that estimates (pi) up to a certain number of iterations from the series approximation. The while loop is used to save from creating a very large sequence vector.

Show the code
piestimate <- function(n){
  estimate=0
  i=0
  while (i<n){
    estimate=estimate+4*(-1)^i/(2*i+1)
    i=i+1
  }
  return(estimate)
}


iterations = seq(100,200,length.out=33)
estimates = sapply(iterations,piestimate)
plot(iterations,estimates)
abline(h=pi)

This was reasonably fast on my computer. Through a bit of trial and error, I found that it takes almost 2 seconds to iterate 10 million times

Show the code
system.time((piestimate(1e7)))
   user  system elapsed 
   2.45    0.08    2.64 

Now, suppose I wanted to run this for the next 10 iterations after 10 million, that’ll take just under 20 seconds.

Open up your resource monitor/task manager/activity monitor and you’ll see one processor is at 100%, doing all the work, so they’re all executed one after each other.

Show the code
iterations = 1e7+(1:10)
system.time(lapply(iterations,piestimate))
   user  system elapsed 
  8.450   0.001   8.474 

To have this done in parallel, consider running this explicitly on more cores. Again, look at how many CPUs at 100%. If you have more than 10 cores, only 10 will max out because we only have 10 jobs.

Show the code
iterations = 1e7+(1:10)


library(parallel)
cl=makeCluster(10)
system.time(parLapply(cl,iterations,piestimate))
   user  system elapsed 
  0.004   0.000   2.266 
Show the code
stopCluster(cl)

You can see that we had a big speedup.

Multiple parameters

Suppose that we a function that should take multiple parameters. We can achieve this one of two ways. First is to directly pass the parameters in the function. By default, *apply functions take a single parameter (which can be a list/vector) so that’s often the easist approach.

Consider this simulation that finds the distribution of pvalues when the null hypothesis is violated.

Show the code
ttestSim <- function(simulationparameters){
  numReps=100
  #simulationparameters is a list
  #slots are named sampleSize, trueMean, direction
  oneSimulation<-function(){
    randomData <- with(simulationparameters,
                       rnorm(n=sampleSize,mean=trueMean)
    )
    ttestobj = with(simulationparameters,t.test(randomData,alternative=direction))
    with(ttestobj,p.value)
  }
  replicate(n=numReps,oneSimulation())
}
ttestSim(list(sampleSize=10,trueMean=0.5,direction="two.sided"))
  [1] 0.5950001086 0.0152322207 0.1401886745 0.4050378513 0.3559611484
  [6] 0.3736753737 0.2222294167 0.1061121533 0.0965678520 0.4178887202
 [11] 0.5275600387 0.0267108628 0.0131791251 0.6801029804 0.1557059603
 [16] 0.5686253867 0.0132573408 0.1281122508 0.0014665333 0.4202809045
 [21] 0.2889091005 0.0768409891 0.1784406816 0.2710897946 0.0430325355
 [26] 0.3870526340 0.0347140301 0.1068830910 0.1864747231 0.8704413047
 [31] 0.1372872573 0.0675301747 0.3419841262 0.0549235125 0.6140789497
 [36] 0.0007396692 0.1352059075 0.0108456206 0.0774051665 0.0436632704
 [41] 0.4668137506 0.0080653637 0.0260376127 0.0111487548 0.4785024116
 [46] 0.1635783565 0.0001097316 0.2085292392 0.2152338434 0.0042317840
 [51] 0.0055595263 0.0129624351 0.3556127374 0.5572777727 0.2338654176
 [56] 0.5993280691 0.0107405711 0.3099691875 0.0758131520 0.8237452968
 [61] 0.1467508701 0.3394300338 0.4014614697 0.0220155347 0.0403249756
 [66] 0.0487214328 0.0766289213 0.4262961723 0.3712724295 0.3463536905
 [71] 0.2306921807 0.8353700010 0.5395468667 0.2341783725 0.0485504255
 [76] 0.1404217079 0.2800928489 0.5440164079 0.1450523548 0.0745501888
 [81] 0.2981062743 0.1163845282 0.0915298436 0.0719958960 0.5520991780
 [86] 0.5090013180 0.0306707707 0.5233365900 0.1179675460 0.1730010262
 [91] 0.0081789717 0.4470722312 0.0029530788 0.0183259041 0.5145758673
 [96] 0.4918368236 0.0188225665 0.3542533393 0.1640264639 0.1515443020

We can create a table that contains all simulation scenarios that we might be interested in and then turn each row of that into a list

Show the code
simScenarios=expand.grid(
  sampleSize=c(5,10,25,50),
  trueMean=c(0,.25,.5,.75,1),
  direction=c("two.sided","greater","less"),stringsAsFactors = F
)

#turn each row into its own list
scenariosList=split(simScenarios,seq(nrow(simScenarios)))

We can quickly test that this works by running through the first three rows

Show the code
simScenarios.small = scenariosList[1:3]
simScenarios.small
$`1`
  sampleSize trueMean direction
1          5        0 two.sided

$`2`
  sampleSize trueMean direction
2         10        0 two.sided

$`3`
  sampleSize trueMean direction
3         25        0 two.sided
Show the code
lapply(simScenarios.small,ttestSim)
$`1`
  [1] 0.346221517 0.185833244 0.181576829 0.576699718 0.968561776 0.228919956
  [7] 0.008531703 0.053519196 0.810551929 0.218225628 0.552369468 0.956100627
 [13] 0.842805001 0.874090050 0.584499989 0.696482593 0.454939587 0.272944745
 [19] 0.276924894 0.088652115 0.191430418 0.762522665 0.471271435 0.229665559
 [25] 0.217797060 0.111630169 0.198796123 0.107493294 0.325309231 0.867020393
 [31] 0.386322209 0.523749568 0.797370398 0.124341318 0.793495457 0.904533696
 [37] 0.529866865 0.015818051 0.296706244 0.650592819 0.094015709 0.932519238
 [43] 0.624418826 0.321238910 0.077074425 0.064728262 0.729090253 0.825717513
 [49] 0.931235382 0.049002772 0.722065549 0.150082832 0.632002360 0.368142448
 [55] 0.114645917 0.661504834 0.870597202 0.968235187 0.698786590 0.592429739
 [61] 0.365946761 0.506386457 0.036865072 0.721761026 0.041446543 0.762576513
 [67] 0.742632348 0.061634859 0.430305181 0.221551706 0.002977898 0.294447096
 [73] 0.765498807 0.646199113 0.260977299 0.934195559 0.931170888 0.466315192
 [79] 0.756718826 0.392930747 0.476523133 0.260598019 0.887333842 0.602422515
 [85] 0.440060880 0.916874070 0.266457662 0.227117176 0.130732591 0.111765521
 [91] 0.613299549 0.386866009 0.298855063 0.681453297 0.778020616 0.488669451
 [97] 0.623729047 0.832760869 0.335702563 0.480100251

$`2`
  [1] 0.94661563 0.50124612 0.08963887 0.07499694 0.30220992 0.58691305
  [7] 0.04895814 0.65837126 0.10808932 0.93098612 0.51175802 0.37406877
 [13] 0.13010917 0.97104010 0.06045302 0.21774473 0.54474362 0.22713581
 [19] 0.96955262 0.66154366 0.16996336 0.70160249 0.98196546 0.59313929
 [25] 0.11665511 0.52893677 0.40618547 0.02462248 0.72940858 0.52894116
 [31] 0.85722755 0.11819616 0.33289830 0.18950283 0.99602564 0.75830309
 [37] 0.90898156 0.64300427 0.69998294 0.02302440 0.12908951 0.35632772
 [43] 0.92370940 0.68766675 0.18955747 0.36391112 0.80192413 0.36258839
 [49] 0.29011077 0.14670362 0.99579798 0.05175098 0.35884144 0.13008490
 [55] 0.30077618 0.40366292 0.59405432 0.34703941 0.42183481 0.18129826
 [61] 0.18113927 0.91498303 0.17481851 0.28698659 0.59727188 0.62398008
 [67] 0.89397770 0.70922377 0.55310704 0.72845674 0.86857119 0.55345530
 [73] 0.45468271 0.27416655 0.85048578 0.45982760 0.35472970 0.61467887
 [79] 0.53689474 0.68820806 0.19445814 0.89028232 0.74803277 0.28935081
 [85] 0.48651768 0.44195594 0.65011445 0.73741668 0.56511210 0.30893682
 [91] 0.40463107 0.85530951 0.40549622 0.78985930 0.75361461 0.87634498
 [97] 0.33973988 0.72380865 0.44434577 0.78683044

$`3`
  [1] 0.04164657 0.72604658 0.20831547 0.13243575 0.80265352 0.47073934
  [7] 0.29639282 0.70595759 0.53646094 0.51174562 0.59255632 0.84780046
 [13] 0.21498119 0.51698816 0.04083199 0.17326961 0.15496821 0.70985822
 [19] 0.42371449 0.50003036 0.75766857 0.64715919 0.11470007 0.75030628
 [25] 0.97819268 0.97044724 0.73690979 0.92672044 0.95282911 0.38109264
 [31] 0.75752088 0.03905729 0.22969111 0.93989945 0.27348946 0.15759714
 [37] 0.56270339 0.94350497 0.92157562 0.51596471 0.43199969 0.51030123
 [43] 0.10597315 0.07020083 0.66155594 0.71934730 0.66435351 0.46840721
 [49] 0.88427251 0.24845608 0.84311400 0.54032405 0.93723332 0.28237555
 [55] 0.64329356 0.25709162 0.21474704 0.74977014 0.02476406 0.93897425
 [61] 0.48845579 0.27965893 0.77549641 0.50942212 0.24719611 0.62772163
 [67] 0.11846918 0.65089873 0.80457473 0.46300131 0.46885071 0.22399402
 [73] 0.52596980 0.51424049 0.69201288 0.33616774 0.58217470 0.53551750
 [79] 0.49731468 0.50122081 0.16005351 0.91442933 0.93589102 0.94142310
 [85] 0.69824316 0.77862653 0.51393128 0.27703298 0.90238933 0.57468830
 [91] 0.20635734 0.45797123 0.64620725 0.22895971 0.87649694 0.55918897
 [97] 0.42884507 0.99640809 0.75735302 0.62483765

Each of the above blocks is the result from one of the scenarios

Show the code
library(parallel)

numCores=detectCores()
cl=makeCluster(numCores)
simResults = parLapply(cl,scenariosList,ttestSim)
stopCluster(cl)

Now, we have a list simResults where each element of the list is 100 pvalues from the corresponding simulation scenarios from simScenarios.List.

Show the code
scenariosList[[7]]
  sampleSize trueMean direction
7         25     0.25 two.sided
Show the code
theTitle=with(scenariosList[[7]],c("n=",sampleSize,", truemean=",trueMean,", direction=",direction))|>paste(collapse="",sep="")
hist(simResults[[7]],main=theTitle)

load balancing and job order

When running in parallel and approximate job lengths are known, it’s you usually won’t have any cases faster than when you - ordering jobs longest to shortest - applying load balancing

Show the code
job = function(i){
  starttime = Sys.time()
  Sys.sleep(i/2)
  endtime = Sys.time()
  list(pid=Sys.getpid(),start=starttime,end=endtime)
}

library(parallel)
Show the code
cl=makeCluster(2)
parl.answers_1_10 = parLapply(cl=cl,1:10,job)
parl.answers_10_1 = parLapply(cl=cl,10:1,job)
parl.answers_1_10LB = parLapplyLB(cl=cl,1:10,job)
parl.answers_10_1LB = parLapplyLB(cl=cl,10:1,job)
stopCluster(cl)
Show the code
if (Sys.info()[["sysname"]]!="Windows"){
  #processor ID changes more often due to forking but total runtimes are comparable
  mc.answers_1_10  =  mclapply(1:10,job,mc.preschedule = T,mc.cores=2)
  mc.answers_10_1  =  mclapply(10:1,job,mc.preschedule = T,mc.cores=2)
  mc.answers_1_10LB = mclapply(1:10,job,mc.preschedule = F,mc.cores=2)
  mc.answers_10_1LB = mclapply(10:1,job,mc.preschedule = F,mc.cores=2)
}
Show the code
makeggplot = function(answers,maintitle=""){
  
  require(ggplot2)
  allpid   = sapply(answers,\(singleResult)singleResult$pid)
  allstart = sapply(answers,\(singleResult)singleResult$start)
  allend   = sapply(answers,\(singleResult)singleResult$end)
  
  
  
  # --- 1. Create a data frame (from your existing vectors) ---
  latesttime = max(allend)
  earliesttime = min(allstart)
  
  job_results_df = data.frame(
    job_id = factor(1:length(answers)),
    pid    = factor(allpid),           # This will be the y-axis aesthetic
    start  = allstart-earliesttime,
    end    = allend-earliesttime
  )
  
  
  
  runtime = round(latesttime-earliesttime,1)
  
  # --- 2. Plot the Gantt Chart with single color and vertical markers ---
  ggplot(job_results_df, 
         aes(x = start, 
             xend = end, 
             y = pid, 
             yend = pid)) +
    
    geom_segment(linewidth = 4, color = "#1f78b4") + 
    
    
    geom_segment(aes(x = end, xend = end, 
                     y = as.numeric(pid) - 0.2, yend = as.numeric(pid) + 0.2), 
                 linewidth = 1.5, color = "#e31a1c") +   # Red for End
    
    # Add labels and title
    labs(
      title = paste(maintitle, runtime, "seconds"),
      x = "Time",
      y = "Process ID (PID)"
    ) +
    # Use a clean theme
    # Remove redundant grid lines
    theme(
      panel.grid.major.y = element_blank()
    )
}

p1.parl <- makeggplot(parl.answers_1_10,"normal shortest to longest")
p2.parl <- makeggplot(parl.answers_10_1,"normal longest to shortest")
p3.parl <- makeggplot(parl.answers_1_10LB,"balanced shortest to longest") 
p4.parl <- makeggplot(parl.answers_10_1LB ,"balanced shortest to longest")

require(patchwork)

(p1.parl + p2.parl) / (p3.parl + p4.parl)

Show the code
if(Sys.info()[["sysname"]]!="Windows"){
  
  p1.mc <- makeggplot(mc.answers_1_10,"normal shortest to longest")
  p2.mc <- makeggplot(mc.answers_10_1,"normal longest to shortest")
  p3.mc <- makeggplot(mc.answers_1_10LB,"balanced shortest to longest") 
  p4.mc <- makeggplot(mc.answers_10_1LB ,"balanced shortest to longest")
  
  require(patchwork)
  
  (p1.mc + p2.mc) / (p3.mc + p4.mc)
  
  
}

Below is an example from parlapply on my server: parlapplysample Below is an example from mclapply on my server: mclapplysample

These were both run on the same machine. I don’t know by the mclapply ones are much more consistent and it generally doesn’t matter. (closer to 14/15 seconds whereas the parlapply ones vary the way I expect them to). When